NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

The Case for External Graph Sketching

https://doi.org/10.1137/1.9781611978759.9

Bender, Michael A; Farach-Colton, Martín; Jacob, Riko; Komlós, Hanna; Tench, David; West, Evan T (January 2025, Society for Industrial and Applied Mathematics)

Full Text Available
Exploring the Landscape of Distributed Graph Sketching

Tench, David; West, Evan; Zhang, Kenny; Bender, Michael A; Delayo, Daniel; Farach-Colton, Martin; Gill, Gilvir; Seip, Tyler; Zhang, Victor (January 2025, SIAM)

Full Text Available
Exploring the Landscape of Distributed Graph Sketching

https://doi.org/10.1137/1.9781611978339.11

Tench, David; West, Evan T; Zhang, Kenny; Bender, Michael A; DeLayo, Daniel; Farach-Colton, Martín; Gill, Gilvir; Seip, Tyler; Zhang, Victor (January 2025, Society for Industrial and Applied Mathematics)

Full Text Available
GraphZeppelin : How to Find Connected Components (Even When Graphs Are Dense, Dynamic, and Massive)

https://doi.org/10.1145/3643846

Tench, David; West, Evan; Zhang, Victor; Bender, Michael A; Chowdhury, Abiyaz; Delayo, Daniel; Dellas, J Ahmed; Farach-Colton, Martín; Seip, Tyler; Zhang, Kenny (September 2024, ACM Transactions on Database Systems)

Finding the connected components of a graph is a fundamental problem with uses throughout computer science and engineering. The task of computing connected components becomes more difficult when graphs are very large, or when they are dynamic, meaning the edge set changes over time subject to a stream of edge insertions and deletions. A natural approach to computing the connected components problem on a large, dynamic graph stream is to buy enough RAM to store the entire graph. However, the requirement that the graph fit in RAM is an inherent limitation of this approach and is prohibitive for very large graphs. Thus, there is an unmet need for systems that can process dense dynamic graphs, especially when those graphs are larger than available RAM. We present a new high-performance streaming graph-processing system for computing the connected components of a graph. This system, which we callGraphZeppelin, uses new linear sketching data structures (CubeSketch) to solve the streaming connected components problem and as a result requires space asymptotically smaller than the space required for a lossless representation of the graph.GraphZeppelinis optimized for massive dense graphs:GraphZeppelincan process millions of edge updates (both insertions and deletions) per second, even when the underlying graph is far too large to fit in available RAM. As a resultGraphZeppelinvastly increases the scale of graphs that can be processed.
more » « less
Full Text Available
Mosaic Pages: Big TLB Reach With Small Pages

https://doi.org/10.1109/MM.2024.3409181

Han, Jaehyun; Gosakan, Krishnan; Kuszmaul, William; Mubarek, Ibrahim N; Mukherjee, Nirjhar; Sriram, Karthik; Tagliavini, Guido; West, Evan; Bender, Michael A; Bhattacharjee, Abhishek; et al (July 2024, IEEE Micro)

Full Text Available
Mosaic Pages: Big TLB Reach with Small Pages

https://doi.org/10.1145/3582016.3582021

Gosakan, Krishnan; Han, Jaehyun; Kuszmaul, William; Mubarek, Ibrahim N.; Mukherjee, Nirjhar; Sriram, Karthik; Tagliavini, Guido; West, Evan; Bender, Michael A.; Bhattarcharjee, Abhishek; et al (March 2023, ASPLOS)

Full Text Available
GraphZeppelin: Storage-Friendly Sketching for Connected Components on Dynamic Graph Streams

https://doi.org/10.1145/3514221.3526146

Tench, David; West, Evan; Zhang, Victor; Bender, Michael A.; Chowdhury, Abiyaz; Dellas, J. Ahmed; Farach-Colton, Martin; Seip, Tyler; Zhang, Kenny (June 2022, Proc. International Conference on Management of Data (SIGMOD))

Full Text Available
GraphZeppelin: Storage-Friendly Sketching for Connected Components on Dynamic Graph Streams

Tench, David; West, Evan; Zhang, Victor; Bender, Michael; Chowdhury, Abiyaz; Dellas, Ahmed; Farach-Colton, Martin; Seip, Tyler; Zhang, Kenny (January 2022, SIGMOD record)

Finding the connected components of a graph is a fundamental prob- lem with uses throughout computer science and engineering. The task of computing connected components becomes more difficult when graphs are very large, or when they are dynamic, meaning the edge set changes over time subject to a stream of edge inser- tions and deletions. A natural approach to computing the connected components on a large, dynamic graph stream is to buy enough RAM to store the entire graph. However, the requirement that the graph fit in RAM is prohibitive for very large graphs. Thus, there is an unmet need for systems that can process dense dynamic graphs, especially when those graphs are larger than available RAM. We present a new high-performance streaming graph-processing system for computing the connected components of a graph. This system, which we call GraphZeppelin, uses new linear sketching data structures (CubeSketch) to solve the streaming connected components problem and as a result requires space asymptotically smaller than the space required for a lossless representation of the graph. GraphZeppelin is optimized for massive dense graphs: GraphZeppelin can process millions of edge updates (both inser- tions and deletions) per second, even when the underlying graph is far too large to fit in available RAM. As a result GraphZeppelin vastly increases the scale of graphs that can be processed.
more » « less
Full Text Available
Paging and the Address-Translation Problem

Bender, Michael; Bhattacharjee, Abhishek; Conway, Alex; Farach-Colton, Martin; Johnson, Rob; Kuszmaul, William; Porter, Don; Tagliavini, Guido; Vorobyeva, Janet; West, Evan (January 2021, Annual ACM Symposium on Parallelism in Algorithms and Architectures)

The classical paging problem, introduced by Sleator and Tarjan in 1985, formalizes the problem of caching pages in RAM in order to minimize IOs. Their online formulation ignores the cost of address translation: programs refer to data via virtual addresses, and these must be translated into physical locations in RAM. Although the cost of an individual address translation is much smaller than that of an IO, every memory access involves an address translation, whereas IOs can be infrequent. In practice, one can spend money to avoid paging by over-provisioning RAM; in contrast, address translation is effectively unavoidable. Thus address-translation costs can sometimes dominate paging costs, and systems must simultane- ously optimize both. To mitigate the cost of address translation, all modern CPUs have translation lookaside buffers (TLBs), which are hardware caches of common address translations. What makes TLBs interesting is that a single TLB entry can potentially encode the address translation for many addresses. This is typically achieved via the use of huge pages, which translate runs of contiguous virtual addresses to runs of contiguous physical addresses. Huge pages reduce TLB misses at the cost of increasing the IOs needed to maintain contiguity in RAM. This tradeoff between TLB misses and IOs suggests that the classical paging problem does not tell the full story. This paper introduces the Address-Translation Problem, which formalizes the problem of maintaining a TLB, a page table, and RAM in order to minimize the total cost of both TLB misses and IOs. We present an algorithm that achieves the benefits of huge pages for TLB misses without the downsides of huge pages for IOs.
more » « less
Full Text Available
Paging and the Address-Translation Problem

https://doi.org/10.1145/3409964.3461814

Bender, Michael A.; Bhattacharjee, Abhishek; Conway, Alex; Farach-Colton, Martín; Johnson, Rob; Kannan, Sudarsun; Kuszmaul, William; Mukherjee, Nirjhar; Porter, Don; Tagliavini, Guido; et al (January 2021, 33rd ACM Symposium on Parallelism in Algorithms and Architectures (SPAA))
null (Ed.)
Full Text Available

Search for: All records